Processor having a RAT state history recovery mechanism

ABSTRACT

A mechanism is provided for allowing a processor to recover from a failure of a predicted path of instructions (e.g., from a mispredicted branch or other event). The mechanism includes a plurality of physical registers, each physical register can store either architectural data or speculative data. The apparatus also includes a primary array to store a mapping from logical registers to physical registers, the primary array storing a speculative state of the processor. The apparatus also includes a buffer coupled to the primary array to store information identifying which physical registers store architectural data and which physical registers store speculative data. According to another embodiment, a history buffer is coupled to the secondary array and stores historical physical register to logical register mappings performed for each of a plurality of instructions part of a predicted path. The secondary array is movable to a particular speculative state based on the mappings stored in the history buffer, such as to a location where a path failure may occur. The secondary array can then be copied to the primary array when a failure is detected in a predicted path of instructions near where the secondary array is located to allow the processor to recover from the predicted path failure.

FIELD

[0001] The invention generally relates to processors, and in particularto RAT state history recovery mechanism.

BACKGROUND

[0002] In some current processors, instructions are decoded into one ormore micro-operations (uops), and each uop is loaded into a re-orderbuffer (ROB) to await scheduling for execution. A register alias table(RAT) is provided for storing a mapping or aliasing between logicalregisters and physical registers. The physical registers include thereal register file (RRF) for storing retired data, and include the ROBfor storing temporary or unretired data. After a uop is executed, theexecution result is temporarily stored in the ROB. Uops are retired (orcommitted to architectural state) in order by physically moving theexecution result (data) from the ROB to the RRF, and updating a pointerin the RAT for the corresponding logical register. An example of thistype of processor is described in U.S. Pat. No. 5,727,176. However, thisconfiguration has limitations. As execution units and other portions ofthe processor increase in speed, it becomes more difficult to physicallymove the data at retirement from the ROB to the RRF. A better techniqueis needed to keep track of temporary and retired data in the processor.

[0003] U.S. Pat. No. 5,197,132 (the '132 patent) discloses a registermapping system having a log containing a sequential listing of registersthat were changed in preceding cycles for post-branch recovery. Aregister map includes a predicted map and a backup map, with each mapstoring a mapping to the physical home of each logical register. Muxesare provided in the '132 patent for selecting between the two maps foruse. However, this arrangement is cumbersome and requires significantsilicon due to the muxing between the two maps, and because data outputpaths are connected to each map. Moreover, the mapping circuit in the'132 patent is inflexible as it requires the backup map to maintain aparticular minimum distance (e.g., 20 clock cycles) behind thepredictive map to allow the processor to confirm that the firstinstruction does not cause an event that requires the register map to bebacked up to an earlier state using the backup map. Thus, the '132patent discloses a restrictive and inflexible approach. As a result,there is a need for a more flexible and effective technique for keepingtrack of the temporary and permanent data in the processor.

SUMMARY

[0004] According to an embodiment of the present invention, an apparatusis provided for allowing a processor to recover from a failure of apredicted path of instructions. The apparatus includes a plurality ofphysical registers, each physical register to store either architecturaldata or speculative data. The apparatus also includes a primary array aprimary array to store a speculative state of the processor includingmappings from logical registers to physical registers. The apparatusalso includes a buffer coupled to the primary array to store informationidentifying which physical registers store architectural data and whichphysical registers store speculative data.

BRIEF DESCRIPTION OF THE DRAWINGS

[0005] The foregoing and a better understanding of the present inventionwill become apparent from the following detailed description ofexemplary- embodiments and the claims when read in connection with theaccompanying drawings, all forming a part of the disclosure of thisinvention. While the foregoing and following written and illustrateddisclosure focuses on disclosing example embodiments of the invention,it should be clearly understood that the same is by way of illustrationand example only and is not limited thereto. The spirit and scope of thepresent invention being limited only by the terms of the appendedclaims.

[0006] The following represents brief descriptions of the drawings,wherein:

[0007]FIG. 1 is a block diagram illustrating a portion of a processoraccording to an embodiment of the present invention.

[0008]FIG. 2 is a diagram illustrating a RAT primary array and aregister file (RF) according to example embodiments of the presentinvention.

[0009]FIG. 3 is a diagram of a history buffer according to an exampleembodiment of the present invention.

[0010]FIG. 4 is a diagram illustrating a RAT shadow array according toan example embodiment of the invention.

[0011]FIG. 5 is a diagram of a history buffer and a RAT primary arrayfor three example steps.

[0012]FIG. 6 is a diagram of a history buffer and a RAT primary arrayfor three example steps according to an example embodiment of thepresent invention.

[0013]FIG. 7 is a diagram of a history buffer, a RAT primary array and aRAT shadow array for three more example steps according to an exampleembodiment of the present invention.

[0014]FIG. 8 is a diagram of a history buffer, a RAT primary array and aRAT shadow array for three additional example steps according to anotherexample embodiment of the present invention.

DETAILED DESCRIPTION

[0015] According to an embodiment of the invention, the processordescribed herein is a speculative machine. If a branch instruction isencountered, prediction logic in the processor predicts whether thebranch will be taken. The branch prediction logic is thus used todetermine a predicted path for speculatively fetching uops. Instructionsare speculatively fetched from memory and decoded into one or more uopsin order. The uops can then be executed out of order. According to anembodiment of the invention, uops can even be speculatively executedbefore their source data is available. The uops are then retired inorder.

[0016] Rather than storing the temporary (unretired) data and thepermanent (retired) data in separate locations or files, the temporaryand permanent data are stored together (intermixed) in a single registerfile (RF). The register file (RF) is an array of physical registers orRF entries, which stores both temporary and permanent data. Thus,because the temporary and permanent data are both stored (intermixed) ina single register file, it is unnecessary to physically move the data atretirement, according to an embodiment of the invention.

[0017] A RAT primary array is provided that stores a mapping from theprocessor logical registers to physical registers (i.e., register fileentries). The RAT primary array stores or reflects a current or workingstate of the processor. According to an embodiment of the invention, theprimary array reflects a current and most speculative state of theprocessor. When a uop is decoded, an allocator allocates an available(or free) RF entry as a physical destination (Pdst) for the executionresults of the uop. According to an embodiment, the RAT primary array isthe only array that is used by the processor to identify the mappingsfrom logical registers to physical registers (of the current state ofthe processor). According to an embodiment, the current state or mostspeculative state of the processor (which is reflected in the RATprimary array) is at the point of allocation (where an RF entry isallocated to the next uop or instruction).

[0018] According to an embodiment of the invention, a RAT shadow arrayand a history buffer are provided. The history buffer is an array thatstores historical state information of the logical and physicalregisters that allows a uop to be done (performed) or undone(reversed),as reflected in the primary array. The successive mappings from logicalregisters to physical registers as allocated for each uop is recorded inthe history buffer. The history buffer also includes a jump color pathfield to allow the processor to distinguish between good uops in thehistory buffer (uops which will be executed and retired) and bad uopswhich were prefetched and allocated but which will not be executed dueto a failure of the predicted path (e.g., due to either a mispredictedbranch or other event).

[0019] The RAT shadow array is a second copy of the RAT (the first copybeing the primary array). Like the primary array, the shadow arrayincludes a pointer or address to an RF entry (or physical register)corresponding to each logical register. The shadow array storesaprocessorRAT state (e.g., mappings from logical register to physicalregister) that allows the processor to recover from a mispredictedbranch or other event (such as an interrupt). The shadow array can bemoved forward or backwards to any position (or instruction) betweenallocation and retirement using the information stored in the historybuffer.

[0020] As described above, the primary array is updated at allocationtime. The shadow array can change states or locations (e.g., moveforward or backwards) based on the information stored in the historybuffer. The shadow array is moved forwards or backwards independentlyfrom the state or position of the primary array and independent of whichuops have executed. The ability of the shadow array to move backwardsallows the shadow array to be located anywhere, rather than berestricted to some minimum distance behind the primary array. If abranch instruction behind the location of the shadow array (i.e.,earlier in the program order) mispredicts (creating a predicted pathfailure), the shadow array can be backed up sequentially until theshadow array reaches the point of path failure (i.e., to the last gooduop or instruction). The shadow array can then be flash copied into theprimary array to allow the primary array to quickly recover from themispredicted branch. According to one example embodiment of theinvention, the RAT attempts to keep the shadow array at the location of(or pointed to) the best estimate of the next mispredicted branch orevent. If a path failure occurs (e.g., a mispredicted branch or an eventis detected) near where the RAT shadow array is located, the RAT shadowarray preferably is flash copied (e.g., all array entries copied in oneclock cycle) into the RAT primary array, thereby quickly moving the RATprimary array back to the point (or state) near where the path failureoccurred. Multiple RAT shadow arrays (e.g., each located at a differentbranch instruction) can also be used to recover from one of severalanticipated mispredicted branches or events.

[0021] Architecture

[0022] Referring to the figures in which like numerals indicate likeelements, FIG. 1 is a block diagram illustrating a portion of aprocessor according to an embodiment of the present invention.Specifically, the instruction pipeline is illustrated in FIG. 1. Theprocessor illustrated in FIG. 1 is provided merely as an exampleembodiment, and the present invention is not limited thereto. Processor100 includes an L1 instruction and data cache 102 for storing data andinstructions, an instruction decoder 104 for decoding instructions intoone or more micro-operations (or micro-ops or uops). As used herein, theterms instruction and uop (or operation) may be used interchangeably,and include instructions, operations, micro-ops, or other types ofinstructions or operations. A trace cache 106 is coupled to theinstruction decoder 104 for storing decoded uops. If one or more uopsare re-executed, the uops can be directly retrieved from the trace cache104, thereby avoiding refetching the instructions from the cache (ormemory) and decoding the instructions.

[0023] A Register Alias Table (RAT) 108 and an allocator 120 eachreceive uops from trace cache 106. RAT 108 translates logical registernames (logical source or Lsrc and logical destination or Ldst) intophysical register addresses (physical source or Psrc and physicaldestination or Pdst). The RAT 108 also includes two arrays for storing amapping from logical register to physical register: a primary array 110stores the current (and most speculative) state, while a shadow array112 stores a previous state that may be used to allow a quick andefficient recovery from a mispredicted branch or other event (such as aninterrupt or trap). According to an example embodiment of the invention,the shadow array 112 can be located or pointed to the best estimate ofthe next mispredicted branch. The shadow array 112 can be pointed to avariety of different states of the processor. The physical registers areshown in FIG. 1 as the register file (RF) 136.

[0024] The allocator 120 allocates resources for each uop, and includesa history buffer 122 for storing past or historical logical register tophysical register mappings and other information. The history buffer 122allows the previous uops to be done or undone, and allows the shadowarray 112 to be created using these historical mappings (from logicalregisters to physical registers) and other information stored in thehistory buffer 122. According to an embodiment of the present invention,the register file (RF) 136 stores or intermixes both temporary data andpermanent (or retired) data. Because one register file is used tophysically store temporary and retired data, it is therefore unnecessaryto physically move the data at retirement, and processor speed can beaccordingly improved. As a result, to keep track of which data in RF 136is temporary, which data is retired, etc., the history buffer 122includes a number of fields and pointers to keep track of the variousstates of data.

[0025] The processor 100 includes a re-order buffer (or ROB) 130 whichdetermines when a uop has properly completed execution and retired. Aninstruction queue (IQ) 132 is connected to the RAT and allocator 120 forstoring uops awaiting to be scheduled for execution. An out-of-order(OOO) scheduler 134 schedules uops in the IQ 132 for execution. Registerfile 136 is connected to scheduler 134 and includes an array of physicalregisters (or RF entries) for storing data. The execution units 138 areconnected to the RF 136 and the scheduler 134 for executing uops. Eachuop includes two sources (i.e., one or more sources) and a destination(as an example). The execution units 138 receive the uop from the IQ 132and scheduler 134. A data cache 140 is provided for storing memory data.

[0026] The RAT Primary Array and the Register File

[0027]FIG. 2 is a diagram illustrating a RAT primary array and aregister file (RF) according to example embodiments of the presentinvention. In this example of FIG. 2, there are five logical registersA, B, C, D and E, and there are 15 RF entries (or physical registers) inthe register file (RF) 136. These numbers are selected merely asexamples. There could be almost any number of logical registers and RFentries (physical registers), so long as there is at least one physicalregister for every logical register. Register file (RF) 136 contains anarray of the physical registers or RF entries. The RF 136 stores (orintermixes) both temporary (i.e., unretired) data and retired data.

[0028]FIG. 2 includes a column 202 identifying the logical register(i.e., either logical register A, B, C, D or E). The RAT primary array110 includes a column 204 that includes pointers to RF entries ofregister file (RF) 136 to identify which RF entries have been mapped tothe logical registers identified by column 202. In this example, theprimary array 110 includes a pointer to RF3 (pointer to entry 3 of theRF 136) for logical register A, a pointer to RF0 for logical register B,a pointer to RF4 for logical register C, a pointer to RF2 for logicalregister D and a pointer to RF6 for logical register E. Thus, primaryarray 110 identifies the current (most speculative) state, andidentifies, in this example, that data for logical register A isphysically stored in RF3, the data for logical register B is stored inRF0, the data for logical register C is stored in RF4, etc.

[0029]FIG. 2 also illustrates an example embodiment of the register file(RF) 136. RF 136 in this example embodiment includes 15 entries (orphysical registers). A column 210 identifies each RF entry(or physicalregister) for the register file 136. The register file 136 includes adata column 212 that stores the data for each of the RF entries. Asnoted above, for each logical register (in this example logicalregisters A-E of array 110), the primary array 110 includes a pointer toa physical register or RF entry where the data for that logical registeris physically stored, or where the execution results will be storedafter execution of the uop.

[0030] For example, as illustrated in FIG. 2, uop0 performs a write ofdata6 to logical register E. The allocator 120 (FIG. 1) selects (orallocates) RF6 as the next available physical register (i.e., entry inRF 136) for the uop (i.e., for storing the execution result of the uop).In this example shown in FIG. 2, RAT 108 updates the primary array 110by storing the pointer to RF6 for logical register E. After updating theRAT primary array 110, array 1 10 indicates that the data for logicalregister E is presently stored (or will be stored after execution) in RFentry RF6. After this uop executes, the execution result (i.e., data6)is stored in RF6 as shown in FIG. 2.

[0031] Overall Operation In Instruction Pipeline

[0032] Referring to FIGS. 1 and 2, the overall operation of theprocessor 100 will be briefly discussed according to an exampleembodiment of the invention. Complex instructions are received from thecache 102 and decoded into one or more micro-ops or uops by theinstruction decoder 104. The uops are stored in the trace cache106. Asoutput from the trace cache 106, each uop includes an operation-code(op-code), one or more source operands (or logical sources or Lsrcs) andone destination operand (logical destination or Ldst), for example. Thelogical sources (Lsrcs) and logical destination(Ldst) may refer to thelogical registers A-E, but do not refer to the physical registers (RFentries). The trace cache 106 provides one or more uops per clock cycleto both the RAT 108 and to the allocator 120.

[0033] The allocator 120 receives at least the op-code of each uop anddetermines what kind of resources are needed to execute the uop. Theallocator 120 then allocates resources for the uop, including allocatingthe next free or available RF entry (or Pdst) in RF 136 for the uop(i.e., for storing the execution result of the uop). This point in thepipeline can be referred to as allocation time. The allocator 120 thenprovides the address or pointer to this new RF entry (the physicaldestination or Pdst) for this uop to the IQ 132 and the RAT, 108. Thepointer to the new RF entry (Pdst) for this uop is provided to the RATvia line 113, for example.

[0034] The RAT 108 receives at least the two logical sources (Lsrcs) andthe logical destination (Ldst) of the uop from the trace cache 106 andidentifies the current physical registers (i.e., physical sources andphysical destination) corresponding to the logical sources (Lsrcs) andthe logical destination (Ldst) for the uop using the RAT primary array110. RAT 108 can identify a corresponding physical register (RF entry)by identifying the RF pointer in column 204 of primary array 110 foreach logical register (Lsrc or Ldst). RAT 108 provides at least the RFpointers to the physical sources (Psrcs) of the uop to the IQ 132. Thus,as an example, the IQ 132 receives the op-code of the uop from tracecache 106 via line 119, receives a pointer or address to the physicaldestination (Pdst) for the uop (i.e., for storing the execution resultof the uop) from the allocator 120, and receives pointers or addressesto the two physical sources (Psrcs) for the uop from RAT 108. As aresult, the IQ 132 receives substantially the same uop as stored intrace cache 106, but receives the physical source and physicaldestination pointers or addresses rather than the logical addresses.

[0035] RAT 108 also receives the address of (or pointer to) the newphysical destination (Pdst) for the uop (corresponding to the Ldst) fromallocator 120 via line 113. RAT 108 updates the primary array 110 tostore the pointer to the new physical destination (Pdst) for the uopcorresponding to the logical register (the Ldst). For example, if a uopdesignates logical register A as the logical destination, and allocator120 allocates RF12 (e.g., as the next available RF entry) as thephysical destination (Pdst) for the uop, RAT 108 updates the pointer incolumn 204 (FIG. 2) for logical register A in primary array 110 to pointto RF12.

[0036] However, before updating the primary array 110 to identify thenew physical register (Pdst) corresponding to the logical register A forthe uop, RAT 108 reads out from primary array 111 and stores the pointerto the old physical register or RF entry (Pdst) corresponding to thelogical register A. (Register A is again, used only as an example). Thispointer to the old physical destination (RF entry) for register A isprovided from the RAT 108 to the allocator 120 via line 117 and is usedby the allocator 120 to create a new entry in the history buffer 122.The history buffer 122 is described in greater detail below.

[0037] The uop stored in the IQ 132, including an op-code, physicalsource addresses or pointers and a physical destination address orpointer, is provided to the scheduler 134 for scheduling for execution.At the appropriate time, the op-code is provided via line 139 to theexecution units 138, and source data may be provided from the physicalregisters from the RF 136 to execution units 138 as identified by thephysical sources of the uop. The scheduler 134 also provides the pointerto the physical destination for the uop (for storing execution results)to the RF 136 and to the execution units 138. The execution units 138(e.g., one of the execution units) execute the uop and stores theexecution result in the physical register (i.e., RF entry) designated bythe uop. In the above example, if RF12 was allocated by allocator 120for the uop, the execution result for the uop would then be physicallystored in RF 12. RAT primary array 110 stores the mapping from logicalregister A to the physical register RF12 (where the data for register Ais physically stored).

[0038] If the next uop also writes to logical register A, a similarprocedure would be followed. Allocator 120 allocates the next availableRF entry for the uop and provides a pointer to this RF entry to the RAT108 via line113. The RAT 108 reads the old pointer (old RF entry) fromcolumn 204 of array 110 for logical register A (i.e., RF12), andprovides this old RF entry to the allocator 120 via line 117 forcreating another entry in the history buffer 122 (recording both the oldand new RF entries and logical register for the uop). RAT 108 thenstores the RF entry for the logical register in column 204 of the RATprimary array 1 10. The history buffer 122 stores information thatallows the uop to be done or undone.

[0039] History Buffer

[0040]FIG. 3 is a diagram of a history buffer according to an exampleembodiment of the present invention. History buffer 122 is an array thatstores historical state information of the logical and physicalregisters that allows a uop to be done (performed) or undone(reversed),as reflected in the primary array. In other words, the history buffer122 allows the effects of each uop as seen by the logical registers tobe done (performed) or undone (reversed). The successive mappings fromlogical registers to physical registers as allocated by allocator 120and mapped by RAT 108 for each uop is recorded in the history buffer122, and thus, the effects to the logical registers resulting from eachuop can be performed or reversed, step by step (i.e., one uop at a time)using information stored in the history buffer 122.

[0041] Referring to FIG. 3, an example history buffer 122 is shown, andincludes information for 10 RF entries. In this example, there are 5renameable logical registers A-E as well as 15 RF entries or physicalregisters (i.e., RF1-RF15) in the register file (RF) 136 (RF 136 is notshown in FIG. 3). The minimum size of the history buffer 122 isdetermined as:

[0042] Minimum size of history buffer 122=no. of physical registers—no.of logical registers. This size allows the state information for all RFentries (physical registers) to be tracked. The history buffer 122 canbe larger.

[0043] The various fields and pointers in the history buffer 122(described in detail below) allow the processor to keep track of thevarious data and states. Each pointer in the history buffer may be, forexample, a 7-bit value that indexes or points to a particular entry inthe history buffer 122. Each pointer in the history buffer 122 isreadable and writeable such that each pointer can be cleared or set toany value.

[0044] Referring to the history buffer 122 of FIG. 3, each columnincludes an XXX/new field 310, a Free/Old field 312, a logicaldestination field 314, a retire field 316 and a jump color path field318 (also known as the path field). The logical destination field 314identifies the logical destination for the uop (e.g., either register A,B, C, D or E). The XXX/New field 310 identifies the new RF entry for thelogical register. X is used in field 310 if no new RF entry has beenassigned yet to the logical register. The free/old field 312 is a listof free (or available) RF entries (if unallocated) or identifies the oldRF entry (previous RF entry) if allocated and not yet retired. TheRetire field 316 is a 1 if the uop has been executed and retired,thereby making the old RF entry free to be reallocated for a new uop. Ifthe retire field 316 is a 1, the corresponding free/old field 312indicates a “free” (or available) RF entry. If the retire field 316 is a0 (meaning the uop and old RF entry are not yet retired), then XXX/Newfield 3 10 will refer to a new RF entry (anew Pdst) and the free/oldfield 312 will refer to an old RF entry (an old Pdst) because the old RFentry is not yet free (available). The jump color path field 318 ofhistory buffer 122 is described below.

[0045] When a uop is retired, it is no longer necessary to store thestate information associated with that uop because there are usually nocircumstances in which one would want to back up the processor to thestate just prior to that uop (and, thus, the historical informationstored in the history buffer 122 for this uop can be deleted).Therefore, the old RF entry 312 (FIG. 3) for the retired uop (i.e., theprevious or old physical register used to store the execution results)is made available (i.e., de-allocated) to be reallocated as a Pdst for anew uop. Thus, in this manner, when a uop properly completes executionand is retired, the ROB 130 (FIG. 1) notifies the allocator 120 that theuop has been retired. The allocatorl20 then sets the correspondingRetire bit (or field) 316 in the history buffer 122 to a 1 and moves aretirement pointer (R) 320 past the corresponding column to indicatethat the uop and its associated new RF entry 310 (Pdst) have beenretired and the old RF entry 312 (corresponding to the same logicalregister) is now available or free to be reallocated as the Pdst for anew uop.

[0046] History buffer 122 also includes three pointers, including anallocation pointer (A pointer) 324, a shadow pointer (an pointer) 322and a retirement pointer (R pointer) 320. All three pointers typicallymove right to left (although the shadow pointer 322 can move eitherdirection depending on whether the shadow array 112 is moving forward orbackward). Allocation pointer 324 points to the next free (or available)RF entry that will (usually) be allocated for the next uop (i.e.,allocated as the Pdst for storing the execution results of the nextuop). Thus, in general, the Free/old RF entries on and to the left ofthe allocation pointer 324 are Free (unallocated), while Free/old RFentries to the right of the allocation pointer 324 are old or allocatedand may or may not be retired yet.

[0047] The retirement pointer 320 (FIG. 3) points to the next RF entrythat will be retired. Old RF entries to the left of the retirementpointer 320 and having a 0 in the retire field 316 are allocated (inuse) and are not yet retired. Old RF entries to the right of theretirement pointer 320 having a 1 in the retire field 316 have beenretired. Old RF entries to the right of the retirement pointer 320 whichhave a 0 in the Retire field 316 were not retired and will not beretired (usually because these uops were part of a Mispredicted paththat should not be retired or committed to architectural state). Theshadow pointer 322 points to the next new RF entry that will be updatedin the shadow array, as described in greater detail below.

[0048] As noted in the Background above, some past systems havephysically stored temporary or speculative data (unretired uop executionresults) in one array (such as a ROB) and the retired data (indicatingthe architectural state of the processor) in a physically separate array(e.g., a Real Register File). According to such a prior technique, whenthe execution results or temporary data was retired, the data wasphysically moved or copied from the first array (or ROB) into the secondarray (or RRF). As processors increase in speed, however, it becomesmore difficult to physically move the data at retirement from the ROB tothe RRF.

[0049] In contrast to this previous technique, the present inventionintermixes both temporary or speculative (i.e., unretired) data andretired data (indicating the architectural state of the processor) in asingle register file (RF) 136. As noted above, when a uop is retired,the Retire field 316 for the uop is set to a 1 and the retirementpointer 320 is incremented to the next uop. Thus, the most recentlyretired data for each logical register indicates the currentarchitectural state of the processor. As noted, the history buffer 122Old RF entries to the right of the retirement pointer 320 having a 1 inthe retire field 316 have been retired, and are considered architecturaldata. The remaining RF entries which have been allocated may also storetemporary or unretired data (execution results) which is speculativedata (speculative because it has not yet been retired or committed toarchitectural state, and it is uncertain whether this temporary datawill be retired). Therefore, the use of a single data array to storeboth unretired (or speculative) data and retired (or architecturalstate) data allows a much simpler and faster technique to be used toeffect retirement because only a retirement pointer 320 and a Retireflag 316 are updated at retirement (rather than physically moving thedata between data arrays).

[0050] Jump Color Path Field of The History Buffer

[0051] The purpose of the jump color path field 318 in the historybuffer 122 will now be briefly described. The jump color path field 318(or path field 318) is used to allow the processor 100 to distinguishbetween good uops (uops which will be executed and retired) and bad uopswhich were prefetched and will not be executed due to a failure of thepredicted path—due to either a mispredicted branch or other event.

[0052] The processor 100 speculatively prefetches instructions anddecodes them into uops for execution. To improve performance, branchprediction logic is provided to make more intelligent decisionsregarding what information to prefetch from memory. Whenever a branchuop enters the instruction pipeline, the prediction logic predictswhether the branch will be taken, and instructions from the predictedpath are prefetched and decoded for execution. Uops are fetched anddecoded in program order, and may execute out of order. If a branch wasmispredicted, all uops prefetched after the mispredicted branch are bador incorrect uops and must be flushed from the pipeline, and theprocessor begins prefetching from the correct path. However, becauseuops can execute out of order (i.e., in an order, that is different fromthe order which the uops were fetched and decoded), several uops mayhave been fetched, and RF entries allocated for each uop before aprevious mispredicted branch is detected. Because, for example, an RFentry (Pdst) was already allocated for each of these (bad) uops when themispredicted branch was detected, the processor needs a technique todistinguish bad uops (or the RF entries in the history buffer 122allocated to bad uops) from the good uops in the history buffer 122. Thebad uops will not be retired, thus, should not be reflected in thehistory buffer 122 as either a current most speculative state or anearlier state of the processor. Thus, the shadow pointer 322 andretirement pointer 320, after stepping to the mispredicted branch willneed to skip over any bad uops (the RF entries allocated for bad uops inarray 110) up to the uops (or their allocated RF entries in the historybuffer 122) of correct path. The jump color path field 318 allows theprocessor to distinguish between RF entries for good uops (the correctpath) and RF entries for bad uops (the mispredicted path).

[0053] The jump color path field (or “path”) identifies micro-ops thatcorrespond to a particular path. A new “path” is created after eachmispredicted branch (or other event). According to an embodiment of theinvention, the path field 318 (ump color path 318) allows a processor todistinguish between bad uops (allocated RF entries) corresponding to amispredicted (or incorrect) path (RF entries allocated before detectionof the mispredicted branch) and subsequent good uops corresponding tothe new correct path that were decoded and RF entries allocated afterdetection of the mispredicted branch. After the shadow and retirementpointers step to the mispredicted branch, the shadow and retirementpointers should skip the bad uops in the primary array 110 up to thefirst good uop (after the mispredicted branch). This is indicated by thefirst uop (or old RF entry in array 110) after the mispredicted branchin which the jump color path field 318 changes.

[0054] At allocation time for each of the new (correct path) uops, theallcoatorl2O allocates an available RF entry for the Pdst for the uop,and the primary array 110 and the history buffer 122 are updated asusual. However, in the history buffer 122, the jump color path field 318will be changed to a new or different value for the new correct uops ascompared to the old uops. A new “path” is created each time an event ormispredicted branch is detected. This new path is established orindicated in the history buffer 122 by using a different value for thejump color path field 318 in history buffer 122. For example, a firstpath can be referred to as the “blue” path, while a second (correct)path (after an event or mispredicted branch is detected) may be a“green” path, with a different value used in path field 318 for thegreen path as compared to the value used for the blue path.

[0055] According to an embodiment of the invention, the new path (i.e.,the use of a different value for the jump color path field 318) isstarted beginning at the location or entry in history buffer 122 wherethe allocation pointer (A) 324 is pointing when the event or themispredicted branch is detected. One or more RF entries for (bad) uopssubsequent to the mispredicted branch may have already been allocatedbefore the mispredictpd branch or event was detected. The jump colorpath field allows the shadow pointer 322 and the retirement pointer 320to skip over these bad entries in the history buffer 122 (since thecurrent speculative state or past state of the processor should notreflect these bad uops which will never be retired). Thus, according toan embodiment of the invention, the uops in buffer 122 after the eventor mispredicted branch which are part of the same path (i.e., same valuein path field 318) as the mispredicted branch are bad, and should beskipped.

[0056] According to an embodiment of the present invention, a uop maytraverse two separate and distinct functional pipelines (distinct fromthe instruction pipeline described above), including an allocation (orprimary) pipeline, and a shadow pipeline. These “pipelines” are notstrict definitions, but merely provide a way to view the groups of stepsor functions performed on a uop or instruction.

[0057] The allocation pipeline may refer to the steps performed relatingto allocation of a uop, including allocation of a new RF entry for auop, updating the primary array 110 based on the new allocationinformation. The allocation pipeline also includes reading out of theold RF entry from the primary array, updating of the history buffer 122when a new RF entry is allocated based on the allocation information(e.g., the logical register, the old RF entry and the newly allocated RFentry). Additional steps, which may be considered as a “retirementpipeline” include steps of updating the history buffer 122 when an oldRF entry is retired (e.g., de-allocating a RF entry by setting theRetire bit 316 and moving the retirement pointer 320 to make the old RFentry available to be allocated to a new uop).

[0058] The shadow pipeline includes the steps of appropriately updatingthe shadow array 112 based on the history buffer 122, for example, to beas close as possible to the next branch uop (or to point to the bestestimate of the next mispredicted branch or event). Because the RAT 108uses information in the history buffer 122 to update the shadow array112, the shadow array 1 12 can be updated asynchronously as compared tothe updating of the primary array 1 10 (i.e., without regard to thestate or timing of the primary array). Thus, the shadow array 112 andthe primary array 110 are substantially decoupled or independent fromeach other. The primary array 110 and the shadow array 112 functionallyinterconnect only when a flash copy is made from the shadow array 112 tothe primary array 110 in response to detection of a failure in thepredicted path, such as detection of a mispredicted branch or detectionof an event.

[0059] RAT Shadow Array

[0060]FIG. 4 is a diagram illustrating a RAT shadow array according toan example embodiment of the invention. The RAT shadow array 112 is asecond copy of the RAT (the first copy being the primary array 110), andhas a structure that is similar to the primary array 110. Shadow array112 includes a pointer 412 or address to an RF entry (or physicalregister) corresponding to each logical register 410. The shadow Array112 stores a processor RAT state (e.g., mappings from logical registerto physical register) that allows the processor to recover from amispredicted branch or other event (such as an interrupt). As shown inthe example shadow array 112 illustrated in FIG. 4, the logical registerA is mapped to RF3, logical register B is mapped to RF1, logicalregister C is mapped to RF4, etc.

[0061] The motivation or reason for providing a shadow (or secondary)array will be briefly described. According to an embodiment of theinvention, the RAT primary array 110 reflects the current and mostspeculative state of the processor. As described above; at allocationtime (i.e., when resources are allocated for a uop, including allocatingan RF entry as the Pdst for the uop), the RAT primary array 10 isupdated to reflect this new speculative state for the logical registers.In other words, at allocation time, the logical to physical registermappings in RAT primary array 110 are updated to reflect the allocationof a new RF entry to a uop, where the RF entry is assigned as thephysical destination (Pdst) for the logical destination (Ldst) of theuop. Thus, at allocation time, the RAT primary array is updated toreflect this new mapping from logical register to physical register.However, as described above, the state stored in the primary array is“speculative.” An event or a mispredicted branch may cause the predictedpath to fail (i.e., where one or more pre-fetched instructions will bebad and will not be retired) which may also cause the speculative statestored in the RAT primary array 110 to be inaccurate or incorrect.

[0062] For example, when a mispredicted branch is detected, theinstruction pipeline is flushed and uops after the mispredicted branchalong a correct path are fetched and decoded for execution. When themispredicted branch is detected, the RAT primary array 110 may contain aspeculative state (i.e., register mappings) that is many uops ahead (inprogram order) of the mispredicted branch. In order to correct theinformation (or state) stored in the primary array 110, the state of theRAT primary array 110 should be backed up to the state just afterallocation of the RF entry for the mispredicted branch (since just afterthe branch is where the new correct path uops will begin fetching andexecuting). Fortunately, according to an embodiment of the invention,the history buffer 122 stores the information necessary to undo orreverse the logical register to physical register mappings performed foreach uop. Thus, according to an embodiment of the-invention, the RAT 108and the allocator 120 can use the information stored in history buffer122 to step the RAT primary array 110 back one or more uops per clockcycle until the primary array reaches the state at or just after themispredicted branch or event. The RAT primary array 110 can be steppedback one uop by replacing the pointer 204 in primary array 110 for alogical register with the pointer to the old RF entry (field 312 inhistory buffer 122). This moves the RAT primary array 110 one uop back.

[0063] However, the primary array 110 may be many uops ahead of theexecution units, and may be even 50 or 100 uops ahead, for example. Assuch, the time penalty or price for a mispredicted branch could be veryexpensive and severe, e.g., up to 50 or 100 clock cycles. This is a highprice (i.e., large time delay) to pay for each mispredicted branch orevent, and can significantly degrade processor performance. Therefore,according to an embodiment of the present invention, a second copy ofthe RAT is maintained as the RAT shadow array 112 (with the first copyof the RAT being the RAT primary array 110) to allow the RAT primaryarray 110 to recover more quickly from an event or mispredicted branch.

[0064] According to an embodiment, both the primary array 110 and theshadow array 112 may step forward one or more uops at a time. Asdescribed above, the primary array 110 is updated at allocation time.The shadow array is updated (or changed) asynchronously (e.g.,independent and decoupled from the primary array 110) based on theinformation stored in the history buffer 122. Thus, the shadow array 112may be one or two cycles behind the primary array 110, for example, butthere are no restrictive requirements on where the shadow must belocated. For example, the shadow array may be located anywhere betweenallocation and retirement. In this example, the shadow array 112continues following the primary array 110, stepping ahead one or moreuops at a time, until the shadow array reaches an estimate of the nextmispredicted branch or event (as an example location). The shadow array112 may stop, for example, just before, on, or just after the estimatednext mispredicted branch or event. The RAT shadow array 112 ismaintained at this best estimate of the next mispredicted branch orevent until it is determined whether or not the execution of the uop orbranch near where the shadow array is located resulted in a failure ofthe subsequent path. For example, the shadow array waits at the bestestimate of the next mispredicted branch until it is determined whetheror not the branch was correctly predicted. Thus, the shadow willprobably wait until the branch or uop executes, but will not have towait for the uop or branch instruction to be retired. If the branchinstruction near where the shadow array is located was correctlypredicted (or the uop does not generate an event), no path failureresults at that point and the shadow array 112 would then resumesequentially stepping forward (e.g., one or more uops per clock cycle)using. information in the history buffer 122 up to the next estimatedevent or mispredicted branch, where the shadow array 112 would againwait for an indication as to whether the branch was correctly predictedor not (or whether the uop generated an event).

[0065] If a path failure occurs (e.g., a mispredicted branch or an eventis detected) at or near where the RAT shadow array 112 is located,rather than stepping the primary array 110 back one uop at a time (whichcan be very time consuming), the RAT shadow array 112 can be flashcopied (e.g., all array entries copied in one clock cycle) into the RATprimary array 110, thereby quickly moving the RAT primary array 110 backto the point (or state) at or near where the path failure occurred(e.g., back to the state near the mispredicted branch or to the uop thatgenerated the event). If the path failure occurred near the location ofthe RAT shadow array 112, the RAT array can be moved or adjusted to thecorrect state if necessary before flash copying the RAT shadow array 112into the RAT primary array 110.

[0066] Therefore, after the event occurs, the RAT shadow array 112 isadjusted to the correct state if necessary (e.g., the shadow is moved oradjusted if the shadow is not at the correct state when the eventoccurs). The “correct” state may be different in differentimplementations. According to an embodiment, the correct state could be,for example, the state up to and including the allocation of themispredicted branch instruction or up to and including the instructionthat caused the event.

[0067] Thus, the RAT primary array 110 can use the RAT copy in theshadow array 112 to recover the correct RAT state more quickly after apredicted path failure (e.g., recover from a mispredicted branch orevent) if the shadow array 112 is located at or near the point of pathfailure.

[0068] According to one example of the invention, the RAT 108 attemptsto keep the shadow array 112 located at (pointed to) the state of thebest estimate of the next mispredicted branch or event. The RAT 108 mayattempt to keep the shadow array 112 as close to the next mispredictedbranch as possible (e.g., on or just before or just after the nextbranch). RAT 108 may use other algorithms or even heuristics or learningprocesses for locating the shadow array 112 in a position that allowsthe RAT the quickest or most efficient recovery from a predicted pathfailure. According to an embodiment, a branch predictor uses branchhistory information to provide a confidence level for each branchinstruction that indicates the probability that a branch was correctlypredicted. According to an embodiment, the RAT shadow array or arraysare preferably located at one or more branches where there is arelatively low probability that the branch was correctly predicted(i.e., located where an event is more likely to occur). When theprocessor determines that the branch was correctly predicted, the shadowarray 12 then continues sequentially stepping ahead until it reaches thenext branch (or the estimate of the next mispredicted branch or event).

[0069] Although techniques are described herein for the placement ormovement of the shadow array 112 in order to allow a quick recovery bythe RAT from a failure in the predicted path, other techniques oralgorithms can be used as well. According to an embodiment, the shadowarray 112 is very flexible and can be moved forward and backward tovirtually any uop or state between uop allocation and uop retirementusing the information in the history buffer 122. According to anembodiment of the invention, if the shadow array 112 moves on past aparticular branch (or other uop), and that branch later mispredicts (orthe uop generates an event), the shadow array 112 can be sequentiallybacked up one or more uops at a time until the shadow array reaches themispredicted branch. When the shadow array 112 has been backed up to thepoint of the predicted path failure (e.g., to the mispredicted branch orto the uop that generated an event), the shadow array 112 is then flashcopied to the primary array 110 to allow the primary array 110 torecover from the predicted path failure.

[0070] The fact that shadow array 112 can be moved backwards means thatit is unnecessary to keep the shadow array 112 behind the point of uopretirement (or the point where it is confirmed that the branchinstruction was correctly predicted). Without the ability to back up theshadow array, the shadow array would typically have to be located at orbehind the point of retirement (or point of uop being confirmed that itexecuted properly without event), rather than moving ahead to the bestestimate of the next point of predicted path failure (e.g., to theestimate of the next mispredicted branch). The location of the bestestimate of the next predicted path failure may be much closer to thelocation of the primary array 110 than the retirement point. Thus,without the ability to back up the shadow array 112, the penalty for amispredicted branch (or other predicted path failure) could be muchgreater. The flexibility of the shadow array 110 therefore can decreasethe penalty associated with a mispredicted branch or other predictedpath failure.

[0071] The shadow array 112 may be considered to be decoupled orindependent from the primary array 110 because the shadow array 112 isnot required to be located at a particular point of execution orretirement or at some other predetermined location or uop with respectto the state of the primary array. For example, it is not necessary forthe shadow array 112 to maintain a predetermined distance (e.g., of atleast 20 uops) between it and the primary array 110. Rather the shadowarray 112 is flexible and can move freely between the point ofallocation and the point of uop retirement without regard to thelocation or operation of the primary array. This, in part, is madepossible by its ability to move backward as well as forward, using theinformation in the history buffer 122.

[0072] According to an embodiment of the invention, an architecture isprovided in which there is only one array (the primary array 110) inwhich register mapping (logical to physical) or register renamingoccurs. The shadow array 112 can be used to allow the RAT primary array110 more quickly recover from a predicted path failure. However,according to an embodiment, there is preferably no data path output fromthe shadow array 112, and the shadow array 112 is preferably not used toactively map or rename registers. Rather, according to an embodiment,the shadow array 112 is moved to the best estimate of where the nextmispredicted branch or other path failure will likely occur. If apredicted path failure occurs at or near that point, the shadow array112 is adjusted to the correct state if necessary and is then copiedinto the primary array 110. However, the RAT preferably does not switchover to using the shadow array 112, as that would require additional ormore complex circuitry to allow a switching or muxing between each arrayand additional data paths from the shadow arrays 112. In other words,the processor 100 preferably does not read Pdst information out of boththe primary array 110 and the shadow array 112, but only reads out ofthe primary array 110. This is only one advantageous aspect of theinvention, but is not required. Other embodiments are possible.

[0073] According to an embodiment, several (or multiple) shadow arrayscan be employed to allow the RAT primary array 110 to recover from anyof several possible path failures (e.g., mispredicted branches orevents). For example, eight shadow arrays can be used, in which eachshadow array 112 uses the information in the history buffer 122 to stepforward (behind the primary array 110). When a first possible pathfailure is identified (e.g., the first branch uop), the first shadowarray stops at or near that first branch. The remaining seven shadowarrays continue stepping forward until they reach a second point (e.g.,a second branch uop) where a path failure is possible or likely, and thesecond shadow array stops at or near this uop. The remaining six shadowarrays 112 continue this process until each of the eight shadow arrays112 (or at least some of them) have reached a different point ofpossible path failure (e.g., reached a different branch or other uopwhere an event can be generated). If one of the eight points or uops(e.g., branch instructions) creates a path failure (e.g., if an event isgenerated or a branch is mispredicted), the RAT shadow array 112 at (orcorresponding to) the point of path failure is flash copied into theprimary array 110, and into the other shadow arrays as well. The primaryarray 110 and all the shadow arrays would then continue moving forwardin the same manner as described above from the point of failure (e.g.,from the mispredicted branch) along a correct path. If the correspondingshadow array is not exactly on the point of failure (e.g., if the pathfails between where two of the shadow arrays are located), the shadowarray 112 that is closest to the point of path failure is selected. Thisselected shadow array closest to point of failure is then moved forwardor backwards (as necessary) to reach the point of failure (i.e., movedto the state or point of the mispredicted branch), and then thisadjusted shadow array 112 is then flash copied into the primary array110 and the other shadow arrays.

EXAMPLES

[0074] Some aspects of the invention will be further explained withreference to the following examples. FIG. 5 is a diagram of a historybuffer and a RAT primary array for three example steps. An example uopstream 505 (including the destination register for each write operation)is shown as an example for explaining aspects of the invention. Thereare five logical registers in the RAT primary array 110 (registers A-E)and there are ten columns or entries in the history buffer 122. Step 1of FIG. 5 illustrates a power-on reset condition. According to anembodiment of the invention, in the power-on reset condition (step 1,FIG. 5), the first five physical register pointers (RF1-RF5) are loadedinto primary array 110 for logical registers A-E, respectively, as shownin FIG. 5. Also, pointers to the remaining RF entries (RF6-RF15) arealso loaded in numerical order in the history buffer as shown. Theallocation pointer (A) 324, shadow pointer (S) 322 and retirementpointer (R) 320 initially point to the first (right-most) column orentry in the history buffer 122. The retire field 316 is set to 1 forall entries or columns in the history buffer to indicate that all the RFentries identfied in free/old field 312 are free (or unallocated). Thislist of RF entries in the free/old field 312 having R fields set to 1 istherefore considered a “free list.”

[0075] Step 2 of FIG. 5 illustrates the result of issuing uop u0 (i.e.,allocating and RF entry and updating primary array 110 and historybuffer 122) according to an example embodiment of the present invention.As shown in the list of uops 505, uop u0 is a write to logical registerA. At allocation time for uop u0, the allocator 120 allocates the nextavailable RF entry as the Pdst for u0. In this case, the next availableRF entry is RF6. After the RF entry (RF6) is selected for the uop u0,the RAT 108 reads out the old RF entry pointer (to RF1) in primary array110 for register A, and stores this old RF entry pointer in the free/oldfield 312 of the first entry of history buffer 122, shown in FIG. 5 asline 520. The newly allocated RF pointer (pointer to RF6) is then storedin the new field 310 for this entry in the history buffer 122, shown asline 522. An A is written to the logical destination field 314 for uopu0 to indicate that logical register A is being renamed or mapped tophysical register RF6. The retirement field 316 for u0 is cleared to azero (0) and will remain cleared until uop u0 retires. The RAT primaryarray 110 is then updated to store the pointer to the new RF entry (RF6)allocated to register A, shown as line 524.

[0076] Also, in step 2, the allocation pointer (A) 324 is moved oneentry to the left to indicate that new RF entry 6 (RF6) has beenallocated as the new Pdst for register A. Also, the retire field (R) 316is cleared to zero to indicate that this uop is not yet retired, andthus, both the old RF entry (RF1) and the new RF entry (RF6) areunavailable (allocated).

[0077] Therefore, it can be seen that at step 2, the history buffer 122includes all the information (i.e., old RF entry, New RF entry, logicalregister) necessary to reverse or undo the logical to physical registermappings caused by issuing uop u0. The uop u0 is identified by field 504in history buffer 122.

[0078] At some point in the future, when uop u0 is retired, ROB 130(FIG. 1) will notify allocator 120 (FIG. 1) that u0 has been retired,and the retire field 316 will be set back to 1, which would indicatethat the old RF entry (RF1) would again be free and available to beallocated to a new uop. Uops are retired in order. Thus, when uop u0retires, any earlier uops that may have wanted the data in RF1 wouldhave also retired. Thus, RF1 can be retired or made available or freewhen uop u0 retires. Uops after uo (such as u1) will want the data inRF6 (or subsequent data) for register A, and thus, RF6 cannot yet bemade available when u0 retires.

[0079] Step 3 of FIG. 5 illustrates the result of issuing uop u1(allocating an RF pointer for u1, and updating the history buffer 122and the primary array 110). Uop u1 is a write to logical register C. RF7is allocated for U1. The old RF pointer (to RF3) is stored in thefree/old field 312 of the second entry of the history buffer 122, line530. The free RF pointer (in the old/free field 312 of step 2) that hasbeen allocated to u 1 (RF7) is stored in the new field 310 of the secondentry of the history buffer 122, line 532. Finally, the RAT primaryarray 110 is then updated to store the pointer to the newest (and mostspeculative) RF entry or Pdst assigned to logical register C (pointer toRF7), shown as line 534. The allocation pointer (A) 324 is moved to thenext (third) entry of buffer 122, and the retire field 316 for thesecond entry, uop u1, is cleared to zero to indicate that this uop (u1)is not yet retired.

[0080]FIG. 6 is a diagram of a history buffer and a RAT primary arrayfor three example steps according to an example embodiment of thepresent invention. Step 1 of FIG. 6 illustrates the results of issuinguops u2-u9. The history buffer 122 in step 1 of FIG. 6 stores a new RFpointer (field 310), an old RF pointer (field 312) and the correspondinglogical register (field 314) for each of uops u0-u9. These reflect thelogical to physical register mappings performed for each of uops u0-u9.For example, u4 results in the old RF entry (RF6) corresponding toregister A to be replaced with the new RF entry (RF10). The RAT primaryarray 111 in step 1 also reflects the newest or most speculative state,after allocation of an RF entry for u9. Referring to the primary array110 in step 1 of FIG. 6, RF12 mapped to register A resulted from uop u6,RF15 mapped to logical register B resulted from uop u9, RF13 mapped toregister C resulted from uop u7, RF14 mapped to logical register Dresulted from uop u8, and RF8 mapped to logical register E resulted fromuop u2. Note that the allocation pointer 324 progressed from right toleft (u0-u9) and then back uo. However, none of the uops u0-u9 have beenretired (all 0s in the Retire field 316). Thus, the allocator 120 wouldat this point stall the RAT from allocating resources for any additionaluops because no RF entries are available.

[0081] Step 2 of FIG. 6 illustrates the result of subsequently retiringuops u0-u9. Ones (1s) have been written to the retire field 3l6 for eachuop, indicating that each of these uops is available again forallocation. Retirement does not alter the contents of the primary array110.

[0082] Step 3 of FIG. 6 illustrates the results of subsequently issuinguops u10-u15. For example, RF5 is allocated to u12, which is a write tologic4 register B. Thus, for u12 in step 3, the old value in array 110for register B (RF15) is stored in the old/free field as the old value,line 620. The new allocated RF pointer (RF5) is then stored in the newfield 310 for u12, line 622. The new allocated RF pointer (RF5) is thenstored in the RAT primary array entry corresponding to logical registerB, line 624.

[0083]FIG. 7 is a diagram of a history buffer, a RAT primary array and aRAT shadow array for three more example steps according to an exampleembodiment of the present invention. Step 1 of FIG. 7 continues from theend of step 3 of FIG. 6. In this example, it is assumed that uop u13 isa branch instruction. As a result, in step 1 of FIG. 7, the shadow array112 advances from u10 to u14, which is the next uop after the branch uop(u13). Uopsu10-u15 have been allocated. It can be seen that uops u10-u1Shave been allocated because the retirement field 316 is cleared to zerofor each ofthese uops. The RAT primary array 110 in step 1 of FIG. 7also reflects the allocation up through u15 (e.g., logical register Abeing mapped to RF6, and register C being mapped to RF9). Because u13 isa branch uop, the uops after u13 (i.e., u14-u15) are part of a predictedpath. Thus, the shadow array 112 contains (or reflects) the state of thelogical registers up through the allocation for u13 (the branch uop).The shadow array 112, however, stops at u14 until the processordetermines whether branch uop u13 was correctly predicted (thus,indicating whether uops u14 and u15 are correct or not). Thus, theshadow array 112 is pointed at (or near) the estimate of the nextpredicted path failure, u13 (since shadow array 112 in step reflects thestate up through the allocation for uop u13, the branch instruction).

[0084] In step 1 of FIG. 7, in this example, it is assumed that branchuop u13 was mispredicted, as shown in FIG. 7, step 1. The branch uop u13mispredicts (a mispredict is detected), and the shadow array 112advances to the branch instruction, uop u13(these could occur in eitherorder).

[0085] In step 2 of FIG. 7, since the state of the shadow array 112reflects Pdst allocations only up through the mispredicted branchinstruction (uop u13), the RAT primary array 110 can recover from themispredicted branch in one clock cycle by flash copying the informationin shadow array 112 to the primary array 110. Step 2 of FIG. 7illustrates the primary array after flash copying the information fromthe shadow array 112 to the primary array 110. Thus, in step 2 of FIG.7, the primary array 110 and the shadow array 112 are identical.However, according to an embodiment, the fields and pointers in thehistory buffer 122 are not changed by the flash copy into the primaryarray 110.

[0086] In step 3 of FIG. 7, after the flash copy from the shadow array112 into the primary array 110, the shadow pointer jumps up to theposition of the allocation pointer 324. Allocator 120 allocates the nexttwo RF entries, RF10 and RF7 (see new Field, N, 310 in history buffer122) for uops u16 and u17, respectively, and allocation pointer A 324steps forward two uops just past u17 (as shown in step 3 of FIG. 7). Inaddition, as compared to step 2 of FIG. 7, the retirement pointer (R)320 continues to step forward, one uop at a time, as the ROB 130notifies the allocator 120 that each of uops u10-u13 have been retired.Thus, the retire field (R) 316 is set to 1 in history buffer 122 foreach of uops u10-u13 because these uops have been retired, while theretire field 316 for bad uops u14-u15 are cleared to zero because thesebad uops will not be retired. The retirement pointer 320 moves forwardup to u14 (uops u10-u13 have now been retired). Thus, at this point, theretirement pointer 320 points to u14. Once the shadow pointer (S) 322and the retirement pointer (R) 320 have moved past the mispredictedbranch u13 (i.e., once all uops up through the mispredicted branch havebeen retired), the retirement pointer (R) 320 may skip over any bad uopsafter the mispredicted branch (or other path failure) which wereallocated before the mispredicted branch was detected. These bad uops(i.e., u14-u15) are part of a mispredicted path and will never beretired (and thus should be skipped and not retired). Preferably,however, the retirement pointer walks (one or more uops per clock cycle)through all the uops (both good and bad), but the processor indicateswhich uops are good (and should be retired) and which uops are bad (andshould not be retired). The ROB 130 can issue a false retirementindication for those bad uops (e.g., u14-u15) after the mispredictedbranch (to indicate that their execution results should not be committedto architectural state). The processor can distinguish bad uops afterthe branch from good uops, for example, based on thejump color pathfield 318 (i.e., bad uops have a greater sequence number than themispredicted branch and a jump color path that is the same as themispredicted branch instruction u13). This is briefly explained below.

[0087] At the time the mispredicted branch was detected, resources hadalready been allocated for uops u14 and u15, which can be seen in step 2of FIG. 7 because the allocation pointer 324 points just past uop u15.Thus, RF entries (i.e., RF6 and RF9, respectively) had already beenallocated to u14 and u15 at the tirne the mispredicted branch wasdetected, as shown in step 1 of FIG. 7. As a result, u14 and u15 areallocated after the mispredicted branch (u13) and are part of amispredicted path. Thus, uops u14 and u15 are bad (incorrect). Uops u14and u15 are bad and will never be retired. Because a mispredicted branchwas detected, the value in the jump color path 318 will be changedbeginning where the allocation pointer (A) 324 was pointing when themispredicted branch was detected. Thus, a new value (1) is used for theJump color path field (J) 318 beginning for uops u16 and u17 to indicatethat these uops are part of a different predicted path (in this case, acorrect path). Thus, in history buffer 122, the jump color path field318 is a zero (0) for u10-u15, and is a one (1) for uops u16 and u17.(The jump color path field 318 for columns after u17 are 0 because thesecolumns are unallocated, and thus are old data, but will be set to lwhen allocated to uop u18, etc.). According to one example, the uopsu10-u15 are part of a green path (jump color path field 318), while uopsu16 and u17 are part of a blue path (ump color path field 318 1).

[0088]FIG. 8 is a diagram of a history buffer, a RAT primary array and aRAT shadow array for three additional example steps according to anotherexample embodiment of the present invention. Step 1 of FIG. 8 continuesfrom the end of step 3 of FIG. 6. At step 1 of FIG. 8, the shadow array112 advances to uop u14, the branch uop u13 mispredicts, and the shadowarray 112 is flash copied into the primary array 110. Thus, the primaryarray 110 and the shadow array in step 1 of FIG. 8 contain the sameinformation.

[0089] At step 2 of FIG. 8, the allocation pointer (A) 324 progressesjust past u17. In this example, the shadow pointer (S) 322 is advancedin sequence (one or more uops at a time) to the mispredicted branch u13,and then skipped over u14 and u15 (bad uops) to u16 based on the changein the jump color path field 318. The shadow pointer 322 then movessequentially up to the allocation point (i.e., past u17). In addition,uop u10 retires and the retirement pointer (R) 320 moves to u11. Theretirement of u1, however, generates an event (such as an interrupt),which causes the subsequent predicted path (including uops u12-u17) tofail. Even u13 is bad and should not have been executed. Thus, uopsu12-u17 are now considered to be bad uops.

[0090] Step 3 of FIG. 8 will now be described. In response to detectingthe event generated by uop u11 of step 2 of FIG. 8, the shadow pointer(S) 322 moves back sequentially (e.g., in order one or more uops perclock cycle) from the location of allocation pointer (A) 324 (column805) back to the position of the retirement pointer (R) 320, which is atu11. The shadow pointer (S) 322 walks backwards sequentially throughboth the good uops and the bad uops. There is no problem with the shadowpointer (S) 322 moving backwards through bad uops(e.g., uops u14 andu15) because this merely restores the old values to the shadow array.Alternatively, the bad uops can be skipped.

[0091] Steps 2 and 3 together illustrate the process of moving theshadow array 112 backwards one uop to u17. The shadow pointer (S) 322 ismoved backwards by copying the value (i.e, the RF pointer) in the oldfield 312 of each column which the shadow pointer (S) 322 traverses orpasses into the appropriate logical register entry in the shadow array112. For example, as shown in steps 2 and 3 of FIG. 8, to move theshadow pointer (S) 322 back one uop to u17, the pointer value (RF2) inthe old field 312 of u17 is copied into the shadow array 112 (shown asline 812 in FIG. 8), at the location in array 112 corresponding to thelogical register for u17, logical register E (shown as line 810, FIG.8). The shadow pointer (S) 322 is accordingly shown as pointing to u17in step 3. Thus, step 3 illustrates the history buffer 122, primaryarray 110 and shadow array 112 after the shadow array (S) 322 has movedbackwards one uop to u17.

[0092] In a similar manner, the shadow pointer(S) 322 then continuesmoving backwards sequentially one or more uops at a time until theshadow pointer (S) 322 reaches the location of retirement pointer (R)320 (pointing to u11), which is the uop that generated the event. Thecontents of the shadow array 112 are then flash copied into the primaryarray 110. The shadow pointer (S) 322 then jumps up to the location ofthe allocation pointer (A) 324. The allocator 120 then continuesallocating RF entries for the next uop (i.e., u18), which is part of thecorrect path. The RF entry (column 805) will be allocated for uop u18,and a different value will be used in the jump color path field 318 foruops u18, u19, etc, because uops u18 and u19 are part of a new predictedpath. The jump color path value for u18 can be a third value (e.g., thevalue 2), or can switch back to the value zero if jump color path 318 isa binary value.

[0093] According to an embodiment, uop u11 does not retire due to theevent (i.e., u11 is a bad uop due to the event). Retirement pointer (R)320 must be moved forward to u18, which is the next uop that will beretired (u11-u17 are bad uops). Because uops u11-u17 are bad uops, theseuops will not be retired and their corresponding retire fields 316 willeach remain a zero (0), indicating not retired. There are different waysto move the retirement pointer (R) 320 forward to u18. Uops u11-u17 areall bad uops, due to uop u11 which generated an event (at the timeallocation pointer A was pointed at the uop or column 805 immediatelyafter u17). Uop u17 was the last uop allocated when the event at u11 wasdetected. Therefore, uops u11-u117 are all bad or incorrect uops, andwill not be retired.

[0094] According to one embodiment of the invention, the ROB 130realizes that uop u11 is a bad uop and sends the allocator 120 a bogusor false retirement indication for uop u11, causing the retirementpointer 320 to move from u11 to u12. The same is done to move theretirement pointer 320 from u12 to u13. The ROB 130 could issue falseretirement indications for each uop between the event and the nextbranch instruction (e.g., issue false or bogus retirement indicationsfor uops u11 and u12 in this example). Once the retirement pointer (R)320 reaches this next branch instruction u13, the retirement pointer (R)320 then skips over the uops with the jump color path field (0) that isthe same as the branch (u13) to u16. Additional false retirementindications are then issued to move the retirement pointer (R) 320 tou18, which is the next correct instruction that will actually becorrectly retired. According to another embodiment, the ROB 130sequentially issues false retirement indications for each of u13-u17,moving the retirement pointer 320 to u18. According to yet anotherembodiment of the invention, a third distinct value (i.e.,2) can be usedin the jump color path field for the new (correct) path of uops u18,u19, etc. This can be, for example, referred to as the purple path, andis associated with the present location of the allocation pointer 324.The retirement pointer (R) 320 would then jump ahead to where theallocation pointer (A) 324 is pointing (e.g., R jumps ahead until itreaches the value in the jump color path field associated with theposition of the allocation pointer 324). Other techniques can be used tomove the retirement pointer (R) 320 to the next uop to be retired (e.g.,to uop u18). However, the retire field 316 for each of the incorrectuops will remain cleared or zero because these incorrect uops will notbe validly retired (but these incorrect or bad uops may generate thebogus retirement indication to move the retirement pointer forward).

[0095] According to an embodiment of the invention, the retirementpointer 320 steps through all uops (both good and bad) after amispredicted branch occurs, and the processor may use false retirementindications for those bad uops. However, if an event occurs that is nota mispredicted branch (e.g., trap, interrupt), the retirement pointer320 may then jump up to the location of the allocation pointer 324 aftera flash copy is performed from the shadow array 112 into the primaryarray 110.

[0096] In general, there may be two types of events: a trap and a fault.If an instruction causes a fault, the instruction will not be retired.However, if an instruction causes a trap, the instruction will beretired (and a 1 will be written to the retire field 316 for theinstruction). Therefore, in the example of step 3 of FIG. 8 describedabove, the uop u11 generated a fault type of event because u11 was notretired (the processor issued a bogus retirement indication for u11).

[0097] A brief explanation will now be provided which describes one wayin which RF entries are reallocated for new uops. The history buffershown in FIG. 8 is ten entries wide and may be considered to be acircular buffer, as an example. The allocator 120 (FIG. 1) allocates RFentries for each new uop. According to an embodiment, the allocator 120can allocate an entry from the oldest uop in the history buffer 122. Forexample, after uops u18 and u19 are allocated, uop u20 must be allocatedfrom the RF entries listed in the new and old fields of the next columnin the history buffer 122 (i.e., column for uop u10). The allocator 120will select the new RF entry 310 or the old RF entry 312 from a columnin the history buffer 122 to be allocated to the new uop, depending onthe value of the corresponding retire field 316 of that column. If theretire field 316 is a 1, (indicating that this previous uop was validlyretired), the old RF entry 312 is allocated to the new uop. Thisindicates that The old RF entry is reallocated when the previous uop isretired because uops are retired in order and there are no other uopswhich will need this old data (data in the old RF entry). Newer uops maystill need the new data (data in the new RF entry). On the other hand,if the retire field is a 0 (indicating that the uop was never retired),the allocator will reallocate the new RF entry 31 o in the column of thehistory buffer 122. This is because the new RF entry contains bad orincorrect data which will not be needed by any uops (and thus can bereallocated), while the old RF entry contains the correct data which maybe needed by other uops.

[0098] As an example, if RF entries have been allocated for uopsu10-u19. The allocator 120 is now ready to allocate an RF entry for uopu20, and the allocator 120 will select one RF entry from the columncorresponding to previous uop u10. The retire field 316 is a 1 for uopu10 as shown at the bottom of FIG. 8. This indicates that uop u10 wasretired, and the old RF entry (RF8 in this example) will be allocatedfor uop u20 For uop u21, it can be seen that the next column correspondsto u11. The retire field 316 for u11 is a zero which indicates that u11was not retired. Thus, the new RF entry (RF3) from u11 at the bottom ofFIG. 8 will be allocated to u21.

[0099] The particular register allocation/deallocation techniquesdescribed herein are demonstrative. Neither these nor any other specificregister allocation/deallocation techniques may be required for thepresent invention. Alternative known or otherwise available registerallocation and/or deallocation techniques may be used.

[0100] Several embodiments of the present invention are specificallyillustrated and/or described herein. However, it will be appreciatedthat modifications and variations of the present invention are coveredby the above teachings and within the purview of the appended claimswithout departing from the spirit and intended scope of the invention.For example, while the present invention has been described withreference to the above-described history buffer, a wide variety oftechniques or buffer formats can be used to keep track of the historicalallocation of physical registers for each uop.

What is claimed is:
 1. An apparatus for allowing a processor to recover from a failure of a predicted path of instructions comprising: a plurality of physical registers, each physical register to store either architectural data or speculative data; a primary array to store a speculative state of the processor including mappings from logical registers to physical registers; and a buffer coupled to the primary array to store information identifying which physical registers store architectural data.
 2. A method of allocating registers in a speculative processor comprising the steps of: receiving an instruction; allocating a physical register for an execution result of the instruction; storing information in a buffer indicating whether an execution result stored in the allocated physical register is architectural data or speculative data.
 3. An apparatus for allowing a processor to recover from a failure of a predicted path of instructions comprising: a plurality of physical registers; a primary array storing a first speculative state of the processor including a mapping from logical registers to physical registers, a secondary array coupled to the primary array, the secondary array storing a second speculative state including a mapping from logical registers to physical registers, the second speculative state being behind or earlier than the primary speculative state; and a history buffer coupled to the secondary array and storing historical physical register to logical register mappings performed for each of a plurality of instructions.
 4. The apparatus of claim 3 wherein the secondary array being movable to a particular speculative state based on the mappings stored in the history buffer, wherein the secondary array can be copied to the primary array when a failure is detected in a predicted path of instructions to allow the processor to recover from the failure in the predicted path.
 5. The apparatus of claim 3 wherein the history buffer identifies the following information for each instruction: a logical register that is the logical destination for the execution results of the instruction; a new physical register selected from available physical registers and that is allocated as the physical destination for the execution results of the instruction, the new physical register being mapped to the logical register; and an old physical register previously mapped to the logical register.
 6. The apparatus of claim 5 wherein the predicted path failure comprises at least one of the following: a mispredicted branch instruction; an instruction that generated a fault, and an instruction that generated a trap.
 7. The apparatus of claim 3 wherein the primary array includes a pointer to a physical register for each logical register.
 8. The apparatus of claim 3 wherein the secondary array includes a pointer to a physical register for each logical register.
 9. An apparatus for allowing a processor to recover from a failure of a predicted path of instructions comprising: a plurality of physical registers; a primary array storing a mapping from logical registers to physical registers, the primary array storing a current speculative state of the processor; an allocator allocating an available physical register as the physical destination for storing the execution results of an instruction, the allocated physical register corresponding to a logical register; a history buffer coupled to the secondary array and storing historical physical register to logical register mappings performed for each of a plurality of instructions of a predicted path; a secondary array coupled to the primary array and the history buffer, the secondary array storing a secondary speculative state of the processor including a mapping from logical registers to physical registers, the secondary array being movable to any instruction between the point of physical register allocation and retirement based on the history buffer, the secondary array being movable to any location or instruction independent of which instructions have been executed; wherein the secondary array can be copied to the primary array to allow the processor to recover from the failure in the predicted path.
 10. The apparatus of claim 9 wherein said secondary array is moved to an estimated location of a next failure in the predicted path, the secondary array being adjusted to a correct state if necessary and then copied into the primary array if a path failure occurs to allow the primary array recover from the path failure to the correct state.
 11. The apparatus of claim 9 wherein the processor attempts to maintain the secondary array at an estimate of the next mispredicted branch instruction or other instruction which may generate an event that would result in a failure of the subsequent path.
 12. The apparatus of claim 9 wherein the secondary array comprises a plurality of secondary arrays, at least some of the secondary arrays being located at different locations, one of the secondary arrays being selected and copied into the primary array if predicted path failure occurs to restore the primary array to a correct state.
 13. The apparatus of claim 11 wherein each of the secondary arrays is located at a different location in the history buffer.
 14. The apparatus of claim 9 wherein the history buffer includes path information that allows the processor to distinguish between: a) register mappings for any instructions after a path failure which are part of failed or incorrect path which should not be executed, and b) register mappings for instructions after a path failure which are part of a correct path.
 15. The apparatus of claim 9 wherein the history buffer includes a list of free or available physical registers.
 16. The apparatus of claim 9 wherein the history buffer comprises one or more pointers to the history buffer, including: an allocation pointer identifying the next available physical register to be allocated for the next instruction; a retirement pointer identifying the entry in the history buffer corresponding to the next instruction to be retired; and a secondary pointer identifying the current location or state of the secondary array.
 17. The apparatus of claim 9 wherein the secondary array comprises a plurality of secondary arrays, each secondary array storing a secondary speculative state of the processor including a mapping from logical registers to physical registers, each of the secondary arrays being movable to any instruction between the point of physical register allocation and retirement based on the history buffer, each of the secondary arrays being movable to any location or instruction independent of which instructions have been executed; wherein one of the secondary arrays can be copied to the primary array to allow the processor to recover from the failure in the predicted path.
 18. The apparatus of claim 17 wherein each of the secondary arrays storing a speculative state at an estimate of where a path failure is likely to occur, the apparatus using branch prediction logic to identify estimates where the path failures are likely to occur.
 19. A method of recovering from a failure in a predicted path in a processor, the processor including a plurality of physical registers and a plurality of logical registers, the method comprising: receiving an instruction; allocating an available physical register as the physical destination for the instruction, the allocated physical register corresponding to a logical register; storing in a primary array a current mapping between the logical register and the currently allocated physical register; storing in a history buffer the logical register, the new allocated physical register corresponding to the logical register, and an old physical register previously corresponding to the logical register; moving, based on the register mappings stored in the history buffer, a secondary array to a location in the predicted path of instructions where a failure may occur; detecting whether a path failure occurs near the location of the secondary array; copying the secondary array to the primary array if a predicted path failure occurs near the location of the secondary array to allow the primary array of the processor to recover from the path failure to a correct state.
 20. The method of claim 19 wherein said step of moving the secondary array comprises moving, based on the register mappings stored in the history buffer, a secondary array to a next mispredicted branch instruction or other instruction where an event may be generated.
 21. The method of claim 19 wherein said step of moving the secondary array comprises moving, based on the register mappings stored in the history buffer, a secondary array to a next branch instruction.
 22. The method of claim 19 wherein said step of moving the secondary array comprises moving, based on the register mappings stored in the history buffer, a secondary array to a location in the predicted path of instructions where a failure may occur, the step of moving being performed independent of which instructions have been executed.
 23. The method of claim 19 wherein said step of detecting whether a path failure occurs near the location of the secondary array comprises detecting whether a misprecdicted branch or other event occurs at or near the location of the secondary array.
 24. A method of a register alias table recovering from a failure in a predicted path in a processor, the processor including a plurality of physical registers and a plurality of logical registers, the method comprising: using a primary array to store a current speculative state of logical register to physical register mappings; using a history buffer to store old and new register mappings that allow each of several instructions in a predicted path to be performed or reversed; using a secondary array to store a secondary speculative state of logical register to physical register mappings at a location in the predicted path where a path failure may occur; and copying the secondary array to the primary array if a path failure occurs at or near the location where the secondary array is located.
 25. A register file provided in a processor, the register file including an array of the physical registers, the register file to store both temporary or unretired data as well as permanent or retired data, information being stored in the processor to identify whether a physical register in the register file stores temporary or permanent data.
 26. A register renaming processor comprising a unified speculative and committed space.
 27. The register renaming processor of claim 26 and further comprising: logic to store a plurality of states including a speculative state and at least one shadow state, each of said plurality of states including mappings from logical registers to physical registers in the plurality of registers.
 28. An apparatus comprising: a plurality of registers, each one of said plurality of registers to store either architecturally committed data or speculative data; logic to store a plurality of states including a speculative state and at least one shadow state, each of said plurality of states including mappings from logical registers to physical registers in said plurality of registers.
 29. The apparatus of claim 28 and further comprising: logic to advance a plurality of pointers, one of said plurality of pointers being an allocation pointer that indicates a primary state that is advanced by speculatively executed instructions, one of said plurality of pointers being a shadow pointer that indicates a shadow state having a lesser degree of speculation that said primary state.
 30. The apparatus of claim 29 and further comprising: speculation recovery logic to copy said shadow state to said primary state to undo at least a portion of completed speculative execution.
 31. The apparatus of claim 29 wherein said logic to advance said plurality of pointers is capable independently advancing each of said plurality of pointers.
 32. The apparatus of claim 30 wherein said at least one shadow state comprises a plurality of shadow states, each of said plurality of shadow states reflecting a different degree of speculation that is lesser than said primary state, and wherein said speculation recovery logic is capable of copying any of said plurality of shadow states to said primary state to undo speculative execution.
 33. The apparatus of claim 28 wherein said logic to store a plurality of states comprises: a primary array to store said speculative state; a secondary array to store said shadow state; and logic to update said primary array upon retirement of an instruction.
 34. The apparatus of claim 33 wherein the logic to update said primary array comprises a history buffer coupled to the secondary array, said history buffer to store historical physical to register to logical register mappings performed for each of a plurality of instructions of a predicted path. 